Goto

Collaborating Authors

 problem-dependent optimal learning rate


Towards Problem-dependent Optimal Learning Rates

Neural Information Processing Systems

We study problem-dependent rates, i.e., generalization errors that scale tightly with the variance or the effective loss at the best hypothesis. Existing uniform convergence and localization frameworks, the most widely used tools to study this problem, often fail to simultaneously provide parameter localization and optimal dependence on the sample size. As a result, existing problem-dependent rates are often rather weak when the hypothesis class is rich and the worst-case bound of the loss is large. In this paper we propose a new framework based on a uniform localized convergence principle. We provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general rich classes; we also establish improved loss-dependent rate for standard empirical risk minimization.


Towards Problem-dependent Optimal Learning Rates

Neural Information Processing Systems

We study problem-dependent rates, i.e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis." Existing uniform convergence and localization frameworks, the most widely used tools to study this problem, often fail to simultaneously provide parameter localization and optimal dependence on the sample size. As a result, existing problem-dependent rates are often rather weak when the hypothesis class is "rich" and the worst-case bound of the loss is large. In this paper we propose a new framework based on a "uniform localized convergence" principle. We provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization.


Review for NeurIPS paper: Towards Problem-dependent Optimal Learning Rates

Neural Information Processing Systems

Clarity: The paper is easy to read, despite being a theoretical work. The authors introduce all of the key concepts and make the manuscript (relatively) self-contained (given the format they do a good job making the paper accessible). However, there are a lot of grammar mistakes/typos, so the whole manuscript has to be very carefully checked.


Review for NeurIPS paper: Towards Problem-dependent Optimal Learning Rates

Neural Information Processing Systems

The reviewers agree that this is an exciting and interesting paper which improves the best-known variance-dependent rates for statistical learning with nonparametric classes, and are all in favor of accepting. I hope the authors will pay attention to the typos and clarifications pointed about by the reviewers and address these in the final version of the paper. As reviewer 4 and the authors' response mention, the point about removing the \log(n) factor about VC classes is subtle, and this paper does not really remove this term unless we make specific assumptions on the value of V*. I would recommend the authors either expand the discussion about this and include a more detailed comparison with prior work, or minimize this claim.


Towards Problem-dependent Optimal Learning Rates

Neural Information Processing Systems

We study problem-dependent rates, i.e., generalization errors that scale tightly with the variance or the effective loss at the "best hypothesis." Existing uniform convergence and localization frameworks, the most widely used tools to study this problem, often fail to simultaneously provide parameter localization and optimal dependence on the sample size. As a result, existing problem-dependent rates are often rather weak when the hypothesis class is "rich" and the worst-case bound of the loss is large. In this paper we propose a new framework based on a "uniform localized convergence" principle. We provide the first (moment-penalized) estimator that achieves the optimal variance-dependent rate for general "rich" classes; we also establish improved loss-dependent rate for standard empirical risk minimization.